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Abstract 

Modeling molecules as undirected graphs and chemical reactions as graph rewriting opera- 
tions is a natural and convenient approach to modeling chemistry. Graph grammar rules are 
most naturally employed to model elementary reactions like merging, splitting, and isomerisation 
of molecules. It is often convenient, in particular in the analysis of larger systems, to summa- 
rize several subsequent reactions into a single composite chemical reaction. We use a generic 
approach for composing graph grammar rules to define a chemically useful rule compositions. 
We iteratively apply these rule compositions to elementary transformations in order to auto- 
matically infer complex transformation patterns. This is useful for instance to understand the 
net effect of complex catalytic cycles such as the Formose reaction. The automatically inferred 
graph grammar rule is a generic representative that also covers the overall reaction pattern of 
the Formose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a 
second glycolaldehyde. Rule composition also can be used to study polymerization reactions as 
well as more complicated iterative reaction schemes. Terpenes and the polyketides, for instance, 
form two naturally occurring classes of compounds of utmost pharmaceutical interest that can 
be understood as "generalized polymers" consisting of five-carbon (isoprene) and two-carbon 
units, respectively. 

1 Introduction 

Directed hypergraphs [8] are a suitable topological representation of (bio) chemical reaction networks 
where (catalytic) reactions are hyperedges connecting substrate nodes to product nodes. Such 

*to whom correspondence should be addressed 



1 



networks require an underlying Artificial Chemistry [3] that describes how molecules and reactions 
are modeled. If molecules are treated as edge and vertex labeled graphs, where the vertex labels 
correspond to atom types and the edge labels denote bond types, then structural change of molecules 
during chemical reactions can be modeled as graph rewrite [2j . In contrast to many other Artificial 
Chemistries this approach allows for respecting fundamental rules of chemical transformations like 
mass conservation, atomic types, and cyclic shifts of electron pairs in reactions. In general, a graph 
rewrite (rule) transforms a set of substrate graphs into a set of product graphs. Hence the graph 
rewrite formalism allows not only to delimit an entire chemical universe in an abstract but compact 
form but also provides a methodology for its explicit construction. 

Most methods for the analysis of this network structure are directed towards this graph (or hy- 
pergraph) structure UJ Ej , which is described by the stoichiometric matrix S of the chemical system. 
Since S is essentially the incidence matrix of the directed hypergraph, algebraic approaches such 
as Metabolic Flux Analysis and Flux Balance Analysis [10] have a natural interpretation in terms 
of the hypergraph. Indeed typical results are sets of possibly weighted reactions (i.e., hyperedges) 
such as elementary flux modes [12] . extreme pathways [11] . minimal metabolic behaviors |9j or a 
collection of reactions that maximize the production of a desired product in metabolic engineer- 
ing. The net reaction of a given pathway is simply the linear combination of the participating 
hyperedges. 

In the setting of generative models of chemistry, each concrete reaction is not only associated 
with its stoichiometry but also with the transformation rule operating on the molecules that are 
involved in a particular reaction. Importantly, these rules are formulated in terms of reaction mech- 
anisms that readily generalize to large sets of structurally related molecules. It is thus of interest 
to derive not only the stoichiometric net reaction of a pathway but also the corresponding "effective 
transformation rule". Instead of attempting to address this issue a posteriori, we focus here on 
the possibility of composing the elementary rules of chemical transformations to new effective rules 
that encapsulate entire pathways. 

The motivation comes from the observation that string grammars are meaningfully characterized 
and understood by investigating the transformation rules. Consider, as a trivial example, the 
context-free grammar 9 with the starting symbol S and the rules S — > aS'a, S' — > aS'a \ B and 
B — > e | bB. Inspecting this grammar we see that we can summarize the effect of the productions 
as B — > b k , k > 0, and S — > a n Ba n , n > 1. The language generated from 9 is thus {a n b*a n \n > 1}. 
Here we explore whether a similar reasoning, namely the systematic combination of transformation 
rules, can help to characterize the language of molecules that is generated by a particular graph 
rewriting chemistry. Similar to the example from term rewriting above, we should at the very least 
be able to recognize the regularities in polymerization reactions. We shall see below, however, that 
the rule based approach holds much higher promises. 

In this contribution we address two issues: First we establish the formal conditions under 
which chemical transformation rules can be meaningfully composed. To this end, we introduce 
in section [2] rule composition within the framework of concurrency theory. We then discuss the 
specific restrictions that apply to chemical systems, leading to the constructive approach to inferring 
composed rules in section [3} 

The basic computational task we envision starts from an unordered set "Ji of reactions such as 
those forming a particular metabolic reaction pathway. To derive the effective transformation rule 
describing the pathway we need to find the correct ordering tt in which the transformation rules pi, 
underlying the individual chemical reactions pi, have to be composed. We illustrate this approach 
in some detail using the Formose reaction as an example in Section [4| 
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2 Graph Grammars and Rule Composition 



Graph grammars, or graph rewriting systems, are proper generalizations of term rewrite systems. 
A wide variety of formal frameworks have been explored, including several different algebraic ones 
rooted in category theory. As a model of chemical transformations the so-called double pushout 
(DPO) formulation appears to be best suited. We refer to [5] for the comprehensive treatise. 
In the following sections we first outline the basic setup and then introduce full and partial rule 
composition. 



2.1 Double Pushout and Concurrency 

The DPO formulation of graph transformations considers transformation rules of form p = (L <r- 
K A- R) where L, R, and K are called the left graph, right graph, and context graph, respectively. 
The maps I and r are graph morphisms. The rule p transforms G to H, in symbols G H if 
there is a pushout graph D and a "matching morphism" m : L — > G such that following diagram 
is valid: 

L < I K r > R 

I I I 

m k n (Y) 

G < p D a > H 

The existence of D is equivalent to the so-called gluing condition, which determines whether the 
rule p is applicable to a match in G. In the following we will also write G =>• H and G H for 
derivations, if the specific match or transformation rule is unimportant or clear from the context. 

Concurrency theory provides a canonical framework for the composition of two graph transfor- 
mations. Given two rules pi = {Li ^- Ki R), i = 1, 2, a composition (L -f^- K R) = p\ *eP2 
can be defined whenever a dependency graph E exists so that in the following diagram: 

Li <— h — Ki — n — > i?i L 2 <— h — K 2 — ri -> R 2 

II \ / II 

u t vi (1) ei e 2 (2) v 2 u 2 

■is 4- \ i/ 4-4- / > 

L^si — d h > E < s 2 C 2 — t 2 ^R {2) 



w\ (3) w 2 

the cycles (1) and (2) are pushouts, and (3) is a pullback, see e.g., [7]. We then have qi = s\ o w\ 
and g r = ti o «j 2 . The concurrency theorem [2] ensures that for any sequence of consecutive direct 
transformations G Pl ' m \ }J P2 ' m ^ G' a graph E, a corresponding ^-concurrent rule p\ *eP2, and 
a morphism m can be found such that G Pl EP2 ' m ) > Q' m 

In order to use graph transformation as a model for chemical reactions additional conditions 
must be enforced. Most importantly, atoms are neither created, nor destroyed, nor transformed 
to other types. Thus only graph morphisms whose restriction to the vertex sets are bijective are 
valid in our context. In particular, the matching morphism m always corresponds to a subgraph 
isomorphism in our context. The context graph K thus is (isomorphic to) a subgraph of both L 
and R, describing the part of L that remains unchanged in R. Conservation of atoms means that 
the vertex sets of L, K, and R are linked by bijections known as the atom-mapping. When the 
atom mapping is clear, thus, we do not need to represent the context explicitly. 

It is important to note that the existence of the matching morphism m : L — > G alone is 
not sufficient to guarantee the applicability of the transformation. In our context, we require in 
addition that the transformation rule does not attempt to introduce an edge in R that has been 
present already before the transformation is applied. Formally, the gluing condition requires that 
(l(x),l(y)) $l L and (r(x),r(y)) 6 R implies (m(l(x)),m(r(y))) £ G. 
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Fig. 1: Full composition of two rules requires that L2 is (isomorphic to) a subgraph of R\. 



2.2 Full Rule Composition 

In the following we will be concerned only with special, chemically motivated, types of rule compo- 
sitions. In the simplest case the dependency graph E is isomorphic to R±, later we will also consider 
a more general setting in which E is isomorphic to the disjoint union of R\ and some connected 
components of L2. For the ease of notation from now on we only refer to a rule composition, and 
not to a composition of morphisms as in Section [2j i.e., p\ *e P2 will be denoted as P2 ° Pi (note 
the order of the arguments changes). If E = Ri, then L2 = 62(^2) is a subgraph of R\. Omitting 
the explicit references to the subgraph matching morphism ei we can simply view L2 as subgraph 
of R\ as illustrated in Figure [Tj 

The rule composition thus amounts to a rewriting R\ > R, while the left side L\ is preserved. 
We will use the notation P2 o p\ and G' 2 P \ G' for this restricted type of rule composition, and 
call it full composition as the complete left side of P2 is a subgraph of R\ . Note that L2 may fit 
into i?i in more than one way so that there may be more than one composite rule. Formally, the 
alternative compositions are distinguished by different matching morphisms e2 in the diagram ([2]); 
we will return to this point below. 

2.3 Partial Rule Composition 

An important issue for the application to chemical reactions is that the graphs involved in the 
rules are in general not connected. Typical chemical reactions combine molecules, split molecules 
or transfer groups of atoms from one molecule to another. The transformation rules for all these 
reactions therefore require multiple connected components. For the purpose of dealing with these 
rules, we introduce the following notation for graphs and derivations. 




Fig. 2: Partial rule composition requires that at least one connected component (here L 2 ) is isomorphic to Ri. 
Additional components of of the second rule may remain unmatched. 
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Fig. 3: Composition of two rules from the Formose reaction (pi and P3); the following rule names will be used: po: 
forward keto-enol tautomerism (which corresponds to the reverse of pi), pi: backward keto-enol tautomerism, P2: 
forward aldol addition (which corresponds to the reverse of P3), and P3: backward aldol addition. The atom mapping 
and matching morphism are implicitly given in these drawings by corresponding positions of the atoms. The context 
K thus consists of all the atoms as well as the chemical bonds (edges) shown in black in both the left and the right 
graph of each rule. 



Let Q be a graph with #Q connected components Qi, i = 1, ... ,#Q. It will be convenient 
to treat Q as the multiset of its components. A typical chemical graph derivation, corresponding 
to a bi-molecular reaction can be written in the form {G\G 2 } ^ {H 1 ,!! 2 , H 3 }, where we take 
the notation to imply that all graphs G l and H 3 are connected. We will furthermore insist that 
representations of chemical reactions are minimal in the following sense: If the left graph of the rule 
p = (L <— K — >• R) matches entirely within G , i.e., m(L) n G 2 = 0, then G 2 can be omitted. (In a 
chemical rewriting grammar, then, one of the H l must be isomorphic to G 2 , becoming redundant 
as well.) More formally, we say that a derivation {G 1 , G 2 , G* G } ^> {H l ,H 2 , H* H } is 
proper if 

Vi, 3 : d = Hj =>- d n m{L) ^ 

That is, a proper derivation cannot be simplified. If the derivation G H is proper then #G < #L. 
The inequality comes from that fact that multiple components of L may easily be matched to a 
single component of G while each component of L must match within a component of G. 

The conditions for the o composition of rules are a bit too strict for our applications. We thus 
relax them respect the component structure of left and right graphs. More precisely, we require 
that E is isomorphic to a disjoint union of a copy of R\ and some connected components of Li 
so that for every connected component L\ of L2 holds that either 62(^2) Q e\(Ri) or e2(L\) is a 
connected component of E isomorphic to L 2 . For a rule composition of this type to be well defined 
we need that 3i such that e-%{L\} C ei(i?i) holds. We remark that the latter condition could be 
relaxed further to lead to additional compositions for which left and right sides are disjoint unions. 

The composition of p\ = {L\,K\,R\) andy»2 = {L2, K2, R2) now yields P2°Pi = {{L\, L 2 }, K3, R3) 
(cmp. Figure [2]). Note that right graph R% cannot no longer be regarded simply as a rewritten 
version of i?i because rule P2 now adds additional vertices to both the left and the right graph. 
The composite context K3 contains only subsets of K\ and K2, but it is expanded by the vertices 
of L\ and the edges of L\ that remain unchanged under rule P2 ■ 
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Fig. 4: The (partial) composition of two rules is mediated by the dependency graph E and the two matching 
morphisms e\ and e^. Since these are subgraph isomorphisms in our case, E is simply the union ei(-Ri) U 62(^2). 
The (partial) match ei(7?i) l~l 62(^2) can be understood as a matching [i between Ri and L2, i.e., as a 1-1 relation of 
the matching nodes and edges. Whenever an edge is matched, then so are its incident vertices. 



An example of a full rule composition is shown in Fig. [3| The two rules in the example, which 
in this case are also chemical reactions, are part of the Formose grammar. The Formose grammar 
consists of two pairs of rules. The first pair of rules, (from now on denoted as po and pi), implements 
both directions of the keto-enol tautomerism. One direction, pi, is visualized in Fig.|3j The second 
pair, (from now on denoted as P2, P3) is the aldol-addition and its reverse respectively. The reverse 
(P3) is also visualized in Fig. [3} We see that the left side of pi is isomorphic to a subgraph of one 
of the components of the right side of P3. Composing the two rules by subgraph matching yields a 
third rule, Pi o p 3 . 

In general, we require here that the connected components of R\ and L2 satisfy either 62(^2) — 
ei(R\) or e\{R :i l )C\e2(L' 1 ^) = 0. We furthermore exclude the trivial case of parallel rules in which only 
the second alternative is realized. In other extreme, if all components L l 2 satisfy ei(L 2 ) C e2(Ri), 
the partial composition becomes a full composition. Formally, these alternatives are described 
by different dependency graphs E and/or different morphisms e\ and 62- Pragmatically we can 
understand this as a matching \i of L2 and R\ as in Fig. [4j Specifying fi of course removes the 
ambiguity from the definition of the rule composition; hence we write P2 °^ pi to emphasize the 
matching /i. 

3 Constructing Rule Compositions 

Given two rules, pi and P2, it is not only interesting to know if a partial composition is defined, 
but also to create the set of all possible compositions 

{P2 ° m Pi,P2 o M2 pi, . . . ,p 2 o^ k pi} 

explicitly. This set in particular contains also all full compositions. The following describes an 
algorithm for enumerating all partial compositions. 
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3.1 Enumerating the Matchings \i 



The key to finding all compositions is the enumeration of all matchings \x that respect out restric- 
tions on overlaps between connected components. We thus start from the sets R\ , • ■ • , Rf Rl X 

and <yL\, L%, ■ ■ ■ , L* i2 | of connected components of R\ and L2, resp. In the first set we find all 
subgraph matches L\ C i?j (represented as the corresponding matchings fiij) and arrange the result 



in a matrix of lists of subgraph matches, Fig. 5a 



The matching matrix is extended by a virtual column to account for the possibility that L\ 
is not matched with any component of R\. Every partial (and full) composition is now defined 
by a selection of one submatch from each row of the matrix, see Supplemental Material for an 
example. The converse is not true, however: Not every selection of matches correspond to a partial 
composition. In particular, we exclude the case that only entries from the virtual column are 
selected. In addition, the sub-matches must be disjoint to ensure that the combined match is 
injective. The latter conditions needs to be checked only when more than one submatch is selected 
from the same column. 



3.2 Composing the Rules 

The construction of the composition p2 p\ of two rules p\ and P2 does not explicitly depend on 
the component structure of R2 and L\ because it is uniquely defined by the matching [i and the 
bijections of the nodes of Lj, Ki, and Ri for each of the two rules. We obtain L by extending L\ 
with unmatched components of L2 and R by extending R2 by the unmatched components of R\. 
The corresponding extension of /i to a bijection fi of the vertex sets of L and R is uniquely defined. 
The context K of the composite rule simply consists the common vertex set of L and R and all 
edges (x, y) of L for which (fi(x) , fi(y)) is an edge in R. We note in passing that jl defines the atom 
mapping of the composite transformation. The explicit construction of (R, K, L) is summarized as 



Algorithm 3.1 



The implementation of the algorithm naturally depends heavily on the representation of trans- 
formation rules, which in our implementation is the representation from the Graph Grammar 
Library (GGL) [6]. The representation is a single graph, with attached vertex and edge properties 
defining membership of L, K and R, as well as the needed labels. 

Not all matchings define valid rule composition. For instance, consider an edge (u, v) that is 
present in R\ and R2 but not in L2 and both u and v are in L2. This would amount to creating 
the edge by means of rule P2 which was already introduced by p\. Since we do not allow parallel 
edges and thus regard such inconsistencies as undefined cases and reject the matching. Note that 
a parallel edge does not correspond to a "double bond" (which essentially is only an edge with a 
specific type). 





R\ 


R\ 


Rf 




R\ 


Ri 




Rf 


1 




2 


L\ 


1 




2 


1 


L\ 




1 


1 


Ll 




1 


1 


1 



(a) Match matrix (b) Extended match matrix 



Fig. 5: Example of a match matrix and the same matrix with its virtual extension. The top row specifies 1 possibility 
for L\ C R{ and 2 for L\ C R[. The extended matrix further specifies that L\ can be unmatched. The bottom 
rows can be interpreted similarly. We display the number of matchings instead of a representation of the matchings 
themselves. 
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Algorithm 3.1: Composing p\ and p 2 to p, by a given partial mapping 
Input: pi = (L^K^R]) 
Input: p 2 = {L 2 ,K 2 ,R 2 ) 

Input: fi, a partial matching between L 2 and Ri 
Output: p = (L, K, R) 

1 p <— empty rule 

2 Copy vertices of p\ to p 

3 foreach vertex v 6 p 2 do 

4 if v is no£ mapped by n then 

5 [ Copy v to p 

6 else 

7 j Change membership in L, if and -R for vertex [i(v ) 

8 Copy edges of pi to p 

9 foreach Edge e G p 2 do 

10 if e is not mapped by fi then 
n j Copy e to p 

12 else 

13 | Change membership in L, K and i? for edge /i(e) 

14 Delete edges and vertices created by p±, but deleted by p 2 

15 if matching condition not satisfied then abort 

16 return p 



3.3 Graph Binding 

The composition of transformation rules, and thereby chemical reactions, makes it possible to create 
abstract meta-rules in a way that is similar to the combination of multiple functions into more 
abstract functions in functional programming. A related concept from (functional) programming 
that seems useful in the context of graph grammars is partial function application. Consider, for 
example, the binding of the number 2 to the exponentiation operator, yielding either the function 
f(x) = 2 X or f(x) = x 2 . In the framework of rule composition, we define graph binding as a special 
case. 

Let G be a graph and p 2 = (L 2 , K 2 , R 2 ) be a transformation rule. The binding of G to p 2 results 
in the transformation rule p = (L, K, R) which implements the partial application of p 2 on G. This 
is accomplished simply by regarding G as a rule p\ = (0,0,G), and using partial composition; 
P = P2 ° Pi- Note that if p 2 o p 1 is a full composition, then p can be regarded as a graph H and 
G % H holds. 

Graph binding allows a simplified representation of reactions. For instance, we can use this 
formal construction to omit uninteresting ubiquitously present molecules such as water by binding 
the graph of the water molecule to the transformation rule of a reaction that requires water. 
Similarly, graph unbinding can be defined as a transformation rule that destroys graphs. In a 
chemical application it can be used to avoid the explicit representation of uninteresting ubiquitous 
molecules such as the solvent. 

3.4 Ordering Rules 

A wide variety of methods, including flux balance analysis, can be used to identify pathways or 
other subsets of reactions that are of interest. Adjacency of reactions in the original networks as well 
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Fig. 6: Above: Chemical reaction network for the Formose reaction; hyperedges are labeled with (i, pj) where i is 
the i-th reaction pi in the rule composition, pj, < j < 3, refers to a specific rule from the Formose reaction; Right: 
Resulting composed rule after the composition of the first i rules along the Formose cycle, context shown in black. 



as their directionality can be used efficiently to prune the possible orders of rule compositions. The 
fact that multiple reactions are instantiations of the same transformation rule, as in the example 
discussed in detail in the next section, further reduces the search spaces. 

4 Results and Discussion 

We illustrate the use of transformation rule composition by deriving of meta-rules from the graph 
grammar consisting of the four rules necessary to represent the complete Formose reaction, see 
Fig. [3} The overall reaction pattern of the Formose cycle is 2go +51—7- 2gi with g$ being formalde- 
hyde and gi being glycolaldehyde. It amounts to the linear combination X^=i Pi °f the eight 
reactions and the influx p\ of go listed in Fig. |3| It is important to notice that several of these 
reactions are instantiations of the same, well-known chemical transformations. We have forward 
keto-enol tautomerism (po: pi. Pa, Pg), backward keto-enol tautomerism (pi: p-?, pg), forward al- 
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dol addition (P2: P3, ps), and backward aldol addition (P3: pg). The composite rule models the 
complete autocatalytic cycle shown in Fig. [6] as a single meta-rule. 

Throughout this section we will not explicitly distinguish between partial composition and 
full composition, and we interpret the composition operator o as right-associative to simplify the 
notation. Thus pi o pj o p k means pi o (pj o p k ). 

The rules are used in the autocatalytic cycle in the following order (starting with an keto-enol 
tautomerisation po): 

PO, P2, PO, P2, PO, Pi, P3, Pi 

As it is not possible to compose this sequence of rules directly, we start by binding glycolaldehyde 
gi to reaction p$, as the before-mentioned keto-enol tautomerisation is applied to molecule g\. The 
resulting rule is denoted as gi . The hyperedges in the chemical reaction network depicted in Fig. [6] 
are numbered according to the sequence that reflects in which order the Formose reaction takes 
place and consequently the order in which the rule composition subsequently is done. The first 
composition refers to the binding operation. This binding of glycolaldehyde results in a graph 
grammar rule, which is depicted in row 1 in the table depicted in Fig. [6j i.e., the rule (0, 0, g\) 
(see "Graph Binding"). The numbers at the hyperedges (2,3, ... 9) refer to the second, third, . . ., 
ninth reaction in the sequence of reactions given above. The graph grammar rule p, , < i < 3, 
used for the corresponding hyper-edge is given next to the sequence number. The rules inferred by 
a subsequent rule composition are given in rows 2 to 9 of the table. 

The application of the final rule results in the composed meta-rule pi o p 3 o . . . o p o gi. This 
rule precisely covers the reaction pattern of the Formose reaction, namely how two formaldehyde 
molecules and one (bound) glycolaldehyde are transformed to two glycolaldehyde molecules. How- 
ever note, that the rule is general enough such that any pair of molecules with aldehyde groups 
can be used, i.e., the inferred reaction pattern refers to a class of overall reactions and the product 
does not necessarily need to be glycolaldehyde. 

The practical computation of these compositions takes less than a second in the current imple- 
mentation. Even for substantially more general composition sequences the running time remains 
manageable. For instance, it takes less than 1 minute to compute all composition sequences with 
a length k < 10 of the form pi L opi 2 o • • • opi k o g q with ij £ {0, 1, 2, 3}, based on the binding of one 
of the influx molecules go or g\. This results in 1875 different inferred composite rules. 

Polymerization can also be viewed as a pathway in a chemical reaction network, albeit one 
of potentially infinite size. The same methods applied to the automatic inference of the overall 
reaction pattern of the Formose cycle can be directly applied to detecting composition rules for 
polymerization reactions. Importantly, even if a chemical reaction network is not given, the ap- 
proaches presented in this paper can be used to automatically find sequences of reactions that will 
lead to polymerization. This can be realized by a straight-forward post-processing step: all that 
needs to be done is to check whether an inferred composite rule exhibits a replicated functional unit. 
Such polymerization meta-rules also enable the analysis of chemical systems with highly complex 
carbon skeletons such as the natural compound classes of the terpenes or the polyketides. 

5 Conclusions 

Graph grammars provide a convenient framework for modeling chemistries on different levels of 
abstraction. A chemically valid approach is to see any chemical reaction as a bi-molecular reaction. 
This requires graph grammar rules that cover changes of molecules in an rather explicit and detailed 
way. Understanding chemical reaction patterns usually requires spanning the chemical reaction 
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networks based on such rules. Obviously, this approach suffers the inherent potential of an immense 
combinatorial explosion. In this paper we introduced the automatic inference of such higher-level 
chemical reaction pattern based on a formal approach for graph grammar rule combination. We 
analyzed the autocatalytic cycle of the Formose reaction and inferred its overall reaction pattern 
as a rule composition of nine rules. Rule composition is also naturally applicable to inferring 
patterns of polymerization reactions. Future work will include e.g. the analysis of terpene-based 
and hydrogen cyanide-based polymerization chemistry. 
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Supplemental Material: 

Example of Enumeration of Compositions 



In this Appendix we show the complete result of the composition of two (artificial) rules, p\ and P2, 
including the selection of submatches from the match matrix. The two rules are depicted in Fig. [7] 
with the extended match matrix of the composition p2°Pi, that corresponds to the example of an 
extended match matrix as given in the paper. The rules in this section are all depicted with vertices 
that have an additional index. The numbering of the components is in increasing order wrt. to 
these indices, e.g., L\ denotes the component connecting nodes ^4,0 and B, 1 and L\ denotes the 
component connecting nodes B, 2 and C, 3. 





(a) pi 



C,3 




C,3 


B, 2 




B^2 


B, 1 

A,0 




B. 1 

A,0 



(b) P 2 





R\ 


R\ 


R\ 


R{ 




1 




2 


1 


Ll 




1 


1 


1 



(c) Extended match matrix 



Fig. 7: The two rules pi and P2, and the extended match matrix of the composition p2°Pi- The components of both 
Ri and L2 are numbered in the same order as the vertex indices. 

In the following we will enumerate all valid selections of submatches based on the extended 
match matrix and give the corresponding resulting rule composition. The chosen matches are 
depicted as • in the extended match matrix. If several matches can be found (in our example this 
is true for the component L\, that can be matched twice in R\), the • has an index. 

Composition 1 

r\ r\ r\ r® 

Lj 





B, 5 








B.5 




B,2 , ^ 






B,2^ 














C,4 


A, 1 


^ \ P,3 
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Result of composition 1 



i 



Composition 2 
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Result of composition 2 



Composition 3 



pi t>2 t?3 piu 
rt^ -fl^ J%i JX^ 
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/I 

P,3_ B ; 5 B7 


A, 1 


... X \ 
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B, 2 


B.9 
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C, 10 



Result of composition 3 



Composition 4 



Li 



R\ R^ R® 
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Result of composition 4 



ii 



Composition 5 



nl p2 p3 p0 
lt^ Til """I 




A. 1 — Q. 

\/ 

B, 2 C, lOy <y B,7 
^B,9 — A. 8 



Result of composition 5 



Composition 6 
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Result of composition 6 



Invalid Selection 



r\ rI rI r® 

•2 



This selection of submatches is invalid, as they are not disjoint (node B, 9 would be matched twice). 
Composition 7 



pi p2 p3 p0 

lt^ Ti^ JX^ JX^ 



Composition 8 
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Result of composition 7 
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Result of composition 8 



Composition 9 
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Result of composition 9 



Composition 10 
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Result of composition 10 
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